Skip to content

feat(networks): add load_osm_network and load_sumo_network#154

Open
FrancescoUmberto wants to merge 1 commit into
c2g-dev:mainfrom
FrancescoUmberto:pr/sumo-integration
Open

feat(networks): add load_osm_network and load_sumo_network#154
FrancescoUmberto wants to merge 1 commit into
c2g-dev:mainfrom
FrancescoUmberto:pr/sumo-integration

Conversation

@FrancescoUmberto
Copy link
Copy Markdown

@FrancescoUmberto FrancescoUmberto commented Apr 24, 2026

Description


This PR adds a new networks module that enables importing road networks from two widely-used file formats: OpenStreetMap (.osm, .osm.gz) and SUMO (.net.xml).

Previously, city2graph could only load geospatial data from Overture Maps (via API) or GTFS feeds. Users working with locally stored OSM exports or SUMO simulation networks had no supported import path.

Approach:


  • New module city2graph/networks.py exposing two public functions that follow the same (nodes_gdf, edges_gdf) | nx.Graph convention used throughout the library (matching travel_summary_graph in transportation.py)
  • load_osm_network(path, *, retain_all, simplify, as_nx, directed) - delegates parsing to osmnx.graph_from_xml (already a core dependency), then normalises the result into indexed GeoDataFrames with `CRS EPSG:4326
  • load_sumo_network(path, *, as_nx, directed) - parses .net.xml using stdlib xml.etree.ElementTree (no new hard dependency), extracts junctions as nodes and edges with lane-level attributes, and reprojects from SUMO's local Cartesian CRS to EPSG:4326 via pyproj (transitive dependency through geopandas)
  • Optional dependency group sumo = ["sumolib>=1.20.0"] added to pyproject.toml for users who need advanced SUMO network introspection beyond what the XML parser provides
  • Both functions support directed and as_nx flags, skip internal SUMO elements, and fall back gracefully when projection or geometry data is missing

Related Issues

N/A

Checklist

  • I have read the Contributing Guide.
  • I have updated the documentation, if necessary.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Pre-commit checks passed locally.

@yu-ta-sato
Copy link
Copy Markdown
Collaborator

yu-ta-sato commented Apr 24, 2026

@FrancescoUmberto thanks so much for the pull request!

First of all, I like the idea of enhancing the data source inputs.
Before getting into the code review, let me share two thoughts:

  1. Handling of .osm files
    City2Graph (c2g) is providing the compatibility to OSMnx (ox) for OSM datasets. As ox is a well maintained package including special operation for data validation and simplification of the OSM's street segments, the design architecture of c2g excluded the scope of OSM handling from scratch. So, the current operation of using ox.graph_from_xml() would be sufficient in my view, as ox's object is directly compatible to c2g.

  2. Location of scripts
    As these scripts are about importing specific data sources, the operation for SUMO would be more suitable if we locate them in /transporation, rather than creating a new module /network

  3. Naming convention
    Also, I am trying to be careful of avoiding ambiguous use of "network" as a term. If it could be equivalent to "graph" unless any distinction is needed, "graph" would be used for this package. So, including the helpers, it could be better to use other namings (e.g., load_sumo_network -> load_sumo_data).

  4. Avoid monoliths
    If you convert the loaded data into c2g's graph format consisting of nodes and edges, it would be better to avoid monolithic approach and split them into loading and construction (c.f., load_gtfs -> travel_summary_graph).
    For SUMO's operation, I would suggest to load and preprocess the raw xml into handy objects (e.g., import into DuckDB, etc.). Also, it would be helpful if you could clarify what could be the nodes and edges in the constructed graph. Based on the definition of the expected graphs to be constructed, we may be able to choose the appropriate name of function for graph construction. During the graph construction, please try to use the existing public APIs or internal helpers as much as possible, to avoid duplicated operations.

I'm looking forward to hearing more info and your thoughts for the next actions!

@FrancescoUmberto
Copy link
Copy Markdown
Author

Thank you for the feedback, your points make sense, and I agree that clarifying the architecture before moving further is the right next step.

Regarding the .osm handling, I agree that City2Graph should continue to rely on OSMnx for OSM ingestion rather than introducing a separate parsing layer. Since ox.graph_from_xml() already provides validation, simplification, and a graph object that is compatible with c2g, my intention is to keep OSM support delegated there and only handle the conversion into the internal c2g representation when needed.

For the module location, your suggestion to place the SUMO importer under the transportation package is reasonable. Since this work is specifically about importing transportation-related sources, moving it into that namespace would align better with the existing package structure and avoid introducing an unnecessary /network module.

I also agree on the naming concern around the term network. To keep the terminology consistent with the rest of the package, I will avoid using network where graph is the intended concept. In practice, I am planning to separate the workflow into two explicit steps:

  • load_sumo_data() for reading and preprocessing the raw SUMO XML
  • build_sumo_graph() for constructing the c2g graph from the parsed objects

This should make the responsibilities clearer than a single monolithic load_sumo_network() function.

On the graph definition itself, my current assumption is:

  • nodes → SUMO junctions / intersections
  • edges → road segments connecting those junctions

This seems to be the most natural mapping into the existing c2g graph model, but I would like to validate that this matches the intended abstraction before finalizing the construction logic.

For the loading stage, I am considering keeping the preprocessing layer lightweight by converting the raw XML into intermediate structured objects first (for example dictionaries or tabular objects), so the graph construction stage can remain independent from the source format. During graph construction I will reuse the existing public graph APIs and internal helpers as much as possible so that we avoid duplicating logic already implemented elsewhere in c2g.

My proposed next steps are:

  1. Move the SUMO importer into transportation/
  2. Split loading and graph construction into separate functions
  3. Confirm the node/edge semantics for the SUMO-derived graph
  4. Reuse existing graph helpers wherever possible
  5. Keep OSM file support delegated to OSMnx

If this direction matches your expectations, I can prepare the revised structure accordingly and share the updated implementation for review.

@yu-ta-sato
Copy link
Copy Markdown
Collaborator

Thanks for the further comments!

I understood that the nodes and edges would be junctions and their segments, like a primary graph of streets. I assume the users would use SUMO to obtain the simulation outcomes with summary metrics (e.g., simulated traffics, travel times, waiting times, etc.).

If so, could they be mapped to the nodes and edges as their attributes? I think the expectation of SUMO users to use c2g is to extend their simulation outcomes to spatial analysis by GeoPandas, network analysis by NetworkX (rustworkx), and GNNs by PyTorch Geometric. As I'm not an expert of SUMO, I am not sure how the attributes could be mapped to nodes and edges, but if that sort of summary metrics could be flexibly designed to map to a c2g's graph structure, that would be awesome!

If these design is applied, the funciton could be simulation_summary_graph, as it could have future generalisation to the other simulation tools (e.g., MATSim).

@FrancescoUmberto
Copy link
Copy Markdown
Author

We can work on this. I will explain you better what's in my mind by the end of this weekend.

I anticipate that it is possible to add simulation data to the graph.

@FrancescoUmberto
Copy link
Copy Markdown
Author

Good morning.

I agree that the real value here is not only converting the SUMO road topology into a c2g graph, but also making the simulation outputs available as graph attributes so that users can immediately reuse the resulting graph for downstream spatial and network analysis.

Based on your suggestion, the graph structure would remain:

  • nodes → SUMO junctions / intersections
  • edges → road segments between junctions

and the simulation metrics could then be attached to those nodes or edges as attributes, for example:

  • edge travel time
  • average speed
  • density
  • occupancy
  • vehicle counts

This would make the resulting graph directly usable with libraries such as GeoPandas, NetworkX, or PyTorch Geometric without requiring users to manually remap the SUMO outputs afterward.

Initial implementation scope


  • net.xml → graph topology
  • edgeData.xml → edge-level simulation metrics

The reason is that edgeData.xml already provides values aggregated per edge, so the mapping into the c2g graph is straightforward and avoids introducing premature complexity around trip-level or lane-level aggregation.

The workflow would then be separated into two stages:

  1. load_sumo_data(): Parse the SUMO files into intermediate structured objects;
  2. simulation_summary_graph(): Construct the c2g graph and map simulation metrics into graph attributes.

That would keep the loading logic separate from the graph construction logic while leaving room to support other simulation frameworks later, such as MATSim, under the same higher-level interface.

At the end of this initial step, I expect this results:

raw_data = load_sumo_data(network_file, metrics_file) 
graph = simulation_summary_graph(raw_data)

For a first implementation, my suggestion would be:

  • keep OSM handling delegated to OSMnx
  • move the SUMO importer under transportation/
  • support net.xml + edgeData.xml first
  • map simulation metrics onto edge attributes
  • design the graph builder so it can later be generalized to other simulators

Once that foundation is in place, we could then extend the importer to support more complex SUMO outputs such as laneData.xml or tripinfo.xml if needed.

If this scope sounds good, I can prepare the revised structure around that initial implementation.

@yu-ta-sato
Copy link
Copy Markdown
Collaborator

yu-ta-sato commented Apr 25, 2026

That sounds nice to me!

If so, load_sumo_data() could provide a standardised object across all simulation tools to be added in the future, like MATSim. As a skeleton, I would propose following I/O:

def load_sumo_data(
    network_file: str | Path,
    node_metrics_file: str | Path | None = None,
    edge_metrics_file: str | Path | None = None,
    *,
    crs: str = "EPSG:4326",
    ) -> dict[str, gpd.GeoDataFrame | pd.DataFrame | None]:

Parse raw SUMO outputs into a standardised, simulator-agnostic intermediate representation, so that future loaders for other simulators (e.g., load_matsim_data()) can return the same schema and reuse simulation_summary_graph() without any change. For the return, it could be like this:

{
"nodes":
gpd.GeoDataFrame indexed by node_id, with junction geometry (Point), or area coverage (Polygon),

"edges":
gpd.GeoDataFrame indexed by (from_id, to_id, key), where key is the source-stable edge id (e.g., for SUMO edge_id, for MATSim link_id, ...).
Each carries segment geometry (LineString) and static edge attributes (length, num_lanes, speed_limit, modes, ...). The triple index matches the (u, v, key) convention used by OSMnx so parallel edges are preserved

"node_metrics":
pd.DataFrame | None indexed by node_id (or (node_id, time)) with simulation metrics in long form

"edge_metrics":
pd.DataFrame | None indexed by (from_id, to_id, key) (or with an extra time level) carrying e.g. travel_time, mean_speed, volume, density, occupancy, ...,

"metadata":
dict with source format, time window, units, CRS, ...,
}

Then, simulation_summary_graph() in common could be:

def simulation_summary_graph(
    sim_data: dict,
    *,
    node_metrics: list[str] | None = None,
    edge_metrics: list[str] | None = None,
    aggregation: str | dict[str, str] = "mean",
    as_nx: bool = False,
    directed: bool = True,
    ) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame] | nx.MultiDiGraph | nx.MultiGraph:

One thing to note that aggregation param could specify how to collapse time-indexed metrics into a single value per node/edge: "mean", "median", "sum", or a per-column mapping.

It's my apologies for confusion, but previously I said these operations could be in /transportation, but I realised they are more suitable to be located in /mobility (as some of the simulation tools are not simply focus on transports, but rather on activity-based mobility).

Does this direction match what you had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants